I love Star Wars. I love the story telling and fantasy, but I especially love the music. John Williams is amazing. There was a podcast out there called Star Wars Oxygen that covered the music of Star Wars and it was one of my favorite podcasts of all time. Jimmy Mac hosted while voice actor, musician, and composer David W. Collins broke down the scores for the films we know and love in a way that gave me a new appreciation for the films. I say there was a podcast because the podcast went dark following the release of Rogue One. After 38 wonderful volumes the podcast just wasn’t updated any more and we the fans have not heard anything about why they stopped producing the show.
I also love statistics and ecology, which is the study of how organisms relate to each other and their environments. One exciting area of research deals with species diversity, which is how many species are found in and among sites. We can use statistics to figure out how many things live in a certain area and compare how different habitats are similar or different to one another. In order to conduct an analysis like this you need a “count matrix,” which has habitats in the rows and species in the columns. The cells are filled in with counts of how many of each species is found in each habitat. An example of a count matrix could look like this, where butterfly species are in columns and different habitats are in rows:
| Danaus plexippus | Vanessa cardui | Adelpha bredowii | |
|---|---|---|---|
| Donner Pass | 5 | 6 | 0 |
| Sierraville | 4 | 2 | 2 |
| Davis | 0 | 0 | 3 |
In this example, we can see that Donner Pass and Sierraville are similar to each other for two species. Also Davis and Sierraville are somewhat similar to each other because they have one species in common. If we were going to group these sites based on similarity, Donner Pass and Sierraville would be more similar to each other than to Davis.
If we plot these relationships as a tree (after applying some statistics) we see that Donner Pass and Sierraville are most similar to each other (they are connected by a branch). This makes sense, because Donner Pass and Sierraville are about 50 km apart, while both sites are about 160 km from Davis.
Cluster plot of the toy example referred to above.
Please note here that I have created this page using RMardown in RStudio. For those curious, all of the code and data used to create this post are freely available through this project’s github repository.
During the Star Wars Oxygen podcast, David W. Collins began what he called his “theme tracker,” which was essential a spreadsheet of the number of times a theme played per film.
David W. Collins made a count matrix.
We can use statistics on count matrices.
We can apply statistics to Star Wars!!!! Oh happy day!!!
To reverse engineer the theme tracker I listened back through all of the Star Wars Oxygen episodes with pencil and paper ready. I made note of how often a theme was played during a particular film every time Mr. Collins mentioned it. In some instances, I had to get a bit of help and I read the breakdowns and threads from these sites:
This was especially helpful when going through Attack of the Clones, which had a lot of music edits.
I then attempted my own impression of David W. Collins and Star Wars Oxygen and went through Rogue One three times and counted each instance of what I thought was a “theme.” I am almost certainly wrong because I am not a trained musician and I might have considered themes to be separate entities when they were they were actually part of the same leitmotifs.
The data I ended up with, and which are used here, had:
These data could be wrong or incomplete and are in need of improvement. I am particularly concerned by the lack of “rare” themes in the data set. Rare things can be important in ecology. Help me! Please contribute to the theme tracker. There are a few ways you could contribute:
github (for those with technical skills).Let’s make a histogram where the total number of appearances each theme makes in the saga is plotted. Hover your cursor over each bar to see what it represents.
Plot of all theme appearances
Let’s make a plot where each film is represented by a bar and that bar is filled according to the frequency of the themes in that movie. Hover your cursor over a bar to see the theme and number of times it appeared in that film. Try clicking on compare data on hover to see all the themes at once.
Themes by film
Now we’ll make a tree depicting the relationships between the seven films of the Star Wars saga just as we did in the toy example above.
A prediction on the clustering analysis. I postulate that the three original trilogy films will cluster together separate from the prequel trilogy films (which will also cluster together). I also predict that The Force Awakens and Rogue One will be more similar to the original trilogy than the prequels.
Adding the data from Rogue One allows us to see where that film lies in relation to the others. Michael Giacchino rooted the music for Rogue One firmly within Star Wars. He used parts from A New Hope to form the themes used in Rogue One, for example Jyn Erso’s Suite was based on “the Message,” which plays in the background when Obi-Wan says “You must learn the ways of the Force….” It is also the only Star Wars film to share “Darth Vader’s” theme with A New Hope.
Clustering of the Star Wars films based on the their musical theme counts.
This plot shows that the prequel trilogy films do indeed cluster together, and that the original trilogy films cluster together with The Forece Awakens. Rogue One is more musically related to the original trilogy, which makes a lot of sense to me because they share a lot of themes. However, the themes shared between A Hew Hope & The Forece Awakens and The Empire Strikes Back & Return of the Jedi create similarities that are too strong for Rogue One to break to it lies outside of the group created by Episode’s VI through VII.
This metric is a way of counting how many things there are in a certain habitat. The cool thing about Jost’s D is that you can consider how many things there are while accounting for how rare they are (that is the q on the bottom of the plot. Here we count the number of different themes by film and consider how many different themes there are if we weight “rarity.”
Plot of the effective number of themes by Star Wars film
To read this plot we look at the y (vertical) axis to see the number of themes. The Greek letter alpha (\(\alpha\)) is the statistical designation for “unique things.” Along the x (horizontal) axis we have the different weights we place on “rarity,” the q that I mentioned above. A weight of 0 means that all themes are equal and it represents the total number of themes present in each film. As we move right along the x-axis we decrease the number of themes because we give them less weight. All the way to the right (q = 5) we hardly consider the effect that rare themes have on the number of themes.
Note that A New Hope has the fewest number of themes (when q = 0). This is likely a result of either incomplete data in the spreadsheet or could be reflective of the fact that it is the first film. Rogue One actually have the highest total number of themes, but from the John Williams scored films Revenge of the Sith has the most themes in our data set. One thing that appears evident from this analysis, is that all films have ~6 themes that we hear frequently in each film.
One last note of geekery. The colors from that plot were made with an R package called spaceMovie, which uses colors from the Star Wars franchise.
Lastly, I employ another method of visualization called NMDS (Non-Metric MultiDimensional Scaling) which plots the locations of each “habitat” in ordination space. In this case, each film appears on the plot in a place relative to the other films. That is to say, similar things should be closer together than dissimilar things.
NMDS Ordination plot of the Star Wars films.
Think about which films you could draw an ellipse around without including any other films. We could draw an ellipse around the prequel trilogy so that the line only contains the prequels. This suggests that the prequel films are closer to each other than they are to other films. It also appears that The Force Awakens is closest to the original trilogy. These findings are consistent with the clustering plot we saw earlier. What is really cool about this plot is we see just how far away Rogue One is from the rest of the films. Even though Rogue One is related to the episodic films of Star Wars, it is its own thing.
I have four big takeaways about the music of the Star Wars films based on this exercise:
These results make a lot of sense to me. I interpret these results to mean that John Williams kept similar themes throughout each of the two trilogies, and that The Force Awakens is building off of the original trilogy, which makes chronological sense. I predict that The Last Jedi will be closely related to THe Force Awakens. Lastly, Michael Giacchino used themes found in A New Hope to ground Rogue One in the Star Wars musical universe, but made it his own.